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Abstract A regression modeling method of space 
weather prediction is proposed. It allows forecasting 
Dst index up to 6 hours ahead with about 90% correla- 
tion. It can also be used for constructing phenomeno- 
logical models of interaction between the solar wind and 
the magnetosphere. With its help two new geoeffective 
parameters were found: latitudinal and longitudinal 
flow angles of the solar wind. It was shown that Dst 
index remembers its previous values for 2000 hours. 

Keywords space weather; prediction; forecasting; 
magnetic storms; statistics; regression 



1 Introduction 

The humankind studies space weather for more than 
4000 years starting from the first mentions of auroras in 
ancient Chinese literature. The term "space weather" 
itself exists for almost a century. The official defini- 
tion adopted by COSPAR states that "Space weather 
describes the physical processes induced by solar ac- 
tivity that have impact on our terrestrial and space 
environment, on ground based and space technological 
systems, and on human activities and health." The 
first part of this definition actually covers two spatial 
scales of space weather, because when we speak about 
space weather in space, e.g. in connection with space- 
craft failures, we usually mean some local parameters 
of the environment, and when we speak about space 
weather on the Earth, e.g. in connection with human 
health, we usually mean some integral characteristics 
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like the geomagnetic indices. Since this article centers 
on the variations of the geomagnetic field, the latter 
meaning will be used. The second part of this defini- 
tion indicates practical manifestations of space weather. 
The impact of the space weath er on techn ological sys- 
tems is generally accepted (see Marubashil ([l989) ) due 
to a number of spectacular events like the superstorm 
of 1989 when Canada's power grid was disabled for 9 
hours and numerous spacecraft failures due to "killer 
electrons" causing arcin g in electronic components, see 
Romanova et al. ( 2005h . The impact on human health. 



however, is disputed b y most specialists. Never t heless , 



the latest repo r ts (e.g . Khabarova fc Dimitrova ( 20081 ). 



Stoupel et al.l (|2006l )) indicate that there is indeed a 



strong correlation between the rate of sudden cardiac 
deaths and the space weather. 

The space weather problem is twofold. The first 
aspect is purely practical and aims for prediction 
and, eventually, mitigation of adverse effects of space 
weather. Ideally, this task should be accomplished 
by launching a vast number of spacecraft which will 
monitor the Sun-Earth region for large-scale structures 
like CMEs. Unfortunately, the resources of the hu- 
mankind are insufficient to produce and maintain such 
a large space fleet as well as to process all the data 
delivered by these spacecraft. So, today we should 
use the resources at hand, which include a few solar 
wind spacecraft (ACE, WIND, SOHO, and STEREO), 
magnetospheric spacecraft (CLUSTER, THEMIS), and 
ground-based stations (Intermagnet, MAGDAS, etc.), 
to develop forecast techniques that will be used in fu- 
ture. Thus, we should try to predict space weather 
with what data we have, and we should aim for pos- 
sibly longer prediction times to allow for some kind of 
countermeasures . 

The second aspect is mostly academic and involves 
study of the processes in the near-Earth space and, 
specifically, understanding of interaction between the 
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solar wind and the niagnetosphere. Naturally, improv- 
ing our knowledge of the underlying physics signifi- 
cantly improves predictive capabilities, so fulfilling the 
second task will significantly help with the first one. 
Modern conceptions of solar wind-magnetosphere in- 
teraction are mostly based on phenomenological mod- 
els constructed in 1960's. However, there are numerous 
problems these models cannot answer. This is largely 
due to the fact that these models were developed at 
the very beginning of the space era when data quality 
and quantity were immeasurably worse than today. For 
more than 40 years we collected astonishing amounts of 
data about solar wind parameters and geomagnetic ac- 
tivity and now it is time to put them to good use. 
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Fig. 1 Autocorrelation function of Dst. Horizontal lines 
correspond to top and mean incidental correlation levels in 
abscence of periodic variations. Gray sine has a period of 
1/2 year and depicts seasonal variations 



2 Possible approaches to space vireather 
prediction 

Space weat her predic t ion is a challenging and nontrivial 



activity, see 



Lietal 



( 20031 ). The most straightforward 
approach to space weather prediction is studying the 
whole complex chain of physical processes involved in 
magnetospheric dynamics and conjugating them in a 
global model of the evolution of the magnetosphere un- 
der the influence of the solar wind. Unfortunately, this 
is not yet possible due to our poor understanding of 
the physics of the interaction between the solar wind 
and the magnetosphere. For this reason, different ap- 
proaches should be tried 



According to iKhabaroval (|2007t ). today there are sev- 
eral established methods of space weather prediction, 
listed below. 

1. Morphological analysis of solar images. 

This method provides the longest prediction time 
(up to a week). Its accuracy is unknown since it is 
used for the academic purposes only. Today it is purely 
manual and thus almost useless for practical implica- 
tions. 

2. Detectio n of large-scale perturbations in the solar 
wind , see e.g. Eselevich fc FainshteinI ( 19931). Eselevich 
et al (|2009t l. 

This method provides a very good prediction time 
(up to several days), but is capable of predicting less 
than 10% of the most intense storms. While it is very 
inaccurate when used alone, it can prove to be useful in 
combination with one of the following short-term meth- 
ods. 

3. Construction of empirical rnodels. see e.g. Burton 
et ah (Il975l^ IValdivia et al.l (Il996l). O'Brien fc McPher- 



ronlil 



2000aVO 'Brien fc McPherronI (l2000bl). Temerin fc 



Li (I2002i). ,Temerin fc L il (|2006|) . iBallatore fc Gonzalez 
( 2003h . lCid et al.l (l2005h . [Siscoe et al.l (l2005h . 



This method provides the shortest prediction time 
(up to 1 hour) with moderate accuracy (~ 70%). Po- 
tentially this method could demonstrate far better re- 
sults if the physics behind the magnetic storms was less 
of a mystery. 

4. Numerical model i ng, se e e.g. Liu et al. ( 2008| ). 



McKenna-Lawlor et al.l (|2008l ). 

This method provides a good prediction time (up to 
several days) but its accuracy varies in huge limits. The 
accuracy of these methods is limited by their inability to 
correctly describe plasma instabilities. Besides, the ring 
current can not be described in the framework of ideal 
MHD, which forms the basis of most numerical models. 
However, they can adequately describe the motion of 
e.g. magnetic clouds in the interplanetary environment, 
but rely on different methods to detect them. 

5. Multidimensional time series analysis. 

This method provides a moderate prediction time 
(up to several hours) with the highest accuracy (> 
80%). They are very effective and easy to use but 
strongly depend on satellite data availability. These are 
"black box" or "input-output" models, which seek only 
to reproduce the system's output in response to changes 
of its inputs. The model terms are usually physically 
interpretable and thus useful for construction of new 
phenomenological models. For this reason, this method 
can not only provide space weather forecast per se, but 
also can improve our knowledge of the physics involved 
and thus increase the efficiency of other methods. 

Further we shall speak about the last method, keep- 
ing in mind that its results can be used later to assist 
other methods. First of all, let us discuss its existing 
implementations. 

Multidimensional time series analysis can be per- 
formed using the methods of statistics, signal process- 
ing, informatics, fuzzy logic etc. The most widely used 
variations are artificial neural networks (e.g. Kugblenu 
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large tine offsets in abscence of periodic variations 



Distribution of correlation coefficient of Dst at very 



et al. (ll999t).IWatanabe e tal] (l2002l). rWing et alj (|2005l) 
Pallocchia et alj "( 2006)1. optimization Ce.g. Zhou & 



Wei ( 19981 ) ■ iBaUkhin et all (|200ll ). IHarrison fc Drezet 
( 20011 )) . an d corre lation ana lysis (e. g. Ranga r ajan fc 



Barreto 



)^999). 



Oh fc Yil (i2004). Wei et al.l (I2004D 



Johnson fc Wind (l2004l ) I Johnson fc Wind (12005^ ). 
Neural network approach provides short-term predic- 
tions up to 4 hours with the corr e lation coefficient of 
0.79 in the paper by IWing et al.l (|2005l ). Earlier im- 



plementations of this approach experienced significant 
difficulties predicting strong geomagnetic storms with 
Kp > 5, but this approach remains one of the most 
popular alongside the empirical methods. Optimiza- 
tion approach seems to be more successful being able 
to provid e 8-hour predictions in the paper by Harrison 
fc Drezet ([2001^ However, in the papers based upon 
the optimization methods the volume of the dataset 
used is insufficient to correctly describe secular varia- 
tions of geomagnetic activity. Correlation analysis gives 
interesting results, but it was used solely for develop- 
ing a nd constraining empirical models (see Johnson fc 
Wing (|2004l )). However, most of these methods have 
a common feature: they lead to a regression relation- 
ship at some point, so it seems natural to skip all the 
preliminary steps and instantly use the regression anal- 
ysis without unnecessary multiplication of entities. Re- 
gres sion analysis itself was attempted earlier b y Srivas- 
tava ( 20051 ). but it was used to estimate the probabil- 
ity of intense/super-intense storm occurence depending 
on the solar and interplanetary parameters. Srivastaval 
was able to predict 2 of 4 super-intense and 5 
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Fig. 3 Dependence of Fisher significance F of the corre- 
sponding term in equation ([T| on the time offset for the l*" 
autocorrelation model 



of 5 intense CME driven storms during the 1996-2002 
period using another 46 CME driven storms to train 
his model. 

Hereafter we propose a new approach, named "re- 
gression modeling" , which already allows achieving ac- 
curate (~ 90%) 6 hours ahead forecast of the Dst in- 
dex, which we will use as a quantitative characteristics 
of space weather. This method can be easily extended 
to predict other geomagnetic indices like Kp or Ap. 



3 Description of the regression modeling 
method 

The proposed method is statistical, but has some fea- 
tures of empirical models. It is based upon the regres- 
sion analysis and the mathematical statistics. In its 
framework the predicted Dst value is sought in the form 



(1) 



where j is the number of current step (number of hours 
since Jan 1, 1963), k is the prediction length, Ci are 
the regression coefficients, and Xi are the regressors, 
which are functions and combinations of input quan- 
tities, which are already measured at the time when 
prediction is made. Values of C'i are determined by 
least square method (LSM) over a large sample of solar 
wind and geomagnetic data (see next chapter), with 
equal statistical weights of all points. The statistical 
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Fig. 4 Seasonal variation of Dst 
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Fig. 5 Diurnal variation of Dst 

significance of th e repres s ors w a s determ i ned by Fisher 
test (F-test) (see iFisher] (|l954h . lHudso3 (|l964l )). This 
test allows separating significant and insignificant re- 
gressors. The insignificant parameters are then rejected 
and the routine is repeated until the regression contains 
only significant regressors. Of course, this method does 
not guarantee that all the significant regressors will en- 
ter the regression, but physical considerations and brute 
force in the form of trial and error provide us with re- 
quested reliability. The regressors Xi are generally non- 
linear, so from the control theory's point of view, this 
method is able to describe discrete dynamical systems 
with strong nonlinearity. This is an essential feature of 
the regression modeling method. 

There is only one manual operation in this method 
- selection of regressors to be considered. For this pur- 
pose all known models, basic physical considerations, 
and random choice are used. Naturally, common sense 
also counts: for example it would be silly to add IMF 
components in GSE and GSM coordinates at the same 
time. If some regressors Xi appeared to be statistically 
significant, we also checked the significance of products 
of their powers Y[ ' where pi can be any real num- 

i 

ber, including zero, but for practical purposes we used 
integer values of pi in the range from to 6. This 
yields a very important feature of the regression anal- 



Fig. 6 

of Dst 



0.4 — 1 



Sum of terms directly describing seasonal variation 




Fig. 7 Sum of terms directly describing diurnal variation 
of Dst 



ysis: it allows checking the statistical significance of 
any regressor, which can be useful for verifying differ- 
ent physical hypotheses. In this sense we will call a 
parameter "geoeffective" if it appears at least in one 
statistically significant regressor. 

More details of this method can be found in the ar- 



ticle iParnowskil (|2009a[ ) . 



4 Data and routine 



The lOMNia (|2009l ) database was used. It contains 
IMF, solar wind and geomagnetic data, averaged over 
1-hour intervals (49 parameters in total, starting from 
Jan 1, 1963). It was supplemented with provisional 
Dst data, taken from WDC for Geomagnetism (Ky- 
oto). Thus a continuous 44-year Dst time series was 
obtained. 

We estimated the geoeffectiveness of a parameter by 
coefficients and statistical significances of all regressors, 
which contain this parameter. This was done in the 
following way. After processing the data with the least 
square method. Fisher significance parameter F was 
determined for each regressor. All the F values were 
compared to the values 2.7055, 3.84, 5.02, 6.635, 7.879, 
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10.83 and 12.1, which correspond to statistical signif- 
icance of 90, 95, 97.5, 99, 99.5, 99.9 and 99.95% re- 
spectively. Then, insignificant regressors were rejected 
and the routine was repeated until all the regressors 
were significant. The number of significant regressors 
depends on the selected significance threshold. All re- 
sults given herein correspond to the significance thresh- 
old of 90%. In contrast to empirical models we do not 
add fitting parameters and all the regressors have phys- 
ical meaning. The described routine was applied to the 
sample, obtained from the initial dataset after rejecting 
filled values. This sample can be divided into two sub- 
samples, corresponding to quiet {Dst > — 50nT) and 
perturbed {Dst < — 50nT) conditions. 

First, we determined which previous Dst values are 
statistically significant. For this purpos e, we con- 
structe d an autoregression (see details in iParnowski 
(l2009bh ) 
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Dst{j + k) = Co + J2 C^^s^O' - i + 1), 



(2) 



Fig. 8 Temporal variation of Dst. Darker spots correspond 
to lower values 



where N is the "age" of the oldest Dst value. This 
model alone is not sufficient to correctly predict Dst, 
but it sets a basis for the construction of models that 
are able to do so. Let us determine the maximum rea- 
sonable value of N. For this purpose, we plot the auto- 
correlation function (ACF) of the Dst index for fc = 1 
(see Fig. [1]). One can see that ACF tends to a sinu- 
soid with a period close to half a year. This is caused 
by seasonal variations. This yields a question: if there 
were no temporal variations, what would ACF tend to 
at large offsets? If the distribution of Dst was normal, 
the answer would be zero. However, the distribution is 
not normal, so ACF can tend to some non-zero quan- 
tity. To determine this quantity we need to remove 
temporal variations. For this purpose we need to cal- 
culate the ACF of a random sample with the same sta- 
tistical characteristics as the Dst sample. The easiest 
way to get such a sample is to process the Dst sample 
with a permutation method, which is widely used for 
determination of correlation functions, e.g. in astron- 
omy. This method is based on random shuffle of the 
sample. Using this method many times (10000 times in 
our case) and calculating the correlation coefficient each 
time, we get the distribution of the correlation coeffi- 
cient by Monte-Carlo method. The distribution of the 
correlation coefficient for this sample (Fig. [2]) appeared 
to be very close to a normal distribution with mean 
0.008 and variance 5.1 • 10~®. The maximum recorded 
value in 10000 trials was equal to 0.015. The top and 
the mean values are depicted on Fig. [Uby horizontal 
lines. As one can see on Fig. [1] in reality the correla- 
tion coefficient exceeds this value at most times due to 



temporal variations. The ACF crosses the top line for 
the first time at ^ 6000 hours, though the difference 
between the ACF and the sine with a half-year period 
crosses it at ^ 2000 hours, which is about 3.5 27-day 
periods, so we will assume the latter value as a rough 
estimation of N. This hints that rather old Dst values 
can be quite significant. Besides the half-a-year period- 
icity one can also notice the 27-day periodicity, caused 
by Carrington rotation of the Sun, which can be taken 
into account by adding the sunspot number R to the 
regression. 

Let us return to equation ([2]). Applying the F-tcst 
we can determine which previous Dst values are sta- 
tistically significant (see Fig. [3]). We did not search 
statistically significant Dst values for N > 900, but it 
is possible that there are even older statistically signifi- 
cant values. A similar situation was reported bv John- 
son & Wing (j2004[ ) regarding Kp: "the significance is 
often quite large for extended periods of time (10-20 
days)" . As our analysis shows, Dst index can "remem- 
ber" its previous values for significantly longer periods 
of time. In fact, after adding regressors, correspond- 
ing to satellite data, some of the previous Dst values 
become insignificant. We found that after the addition 
of these regressors there are still statistically significant 
values as far as 801 hours ago (33 days and 9 hours) for 
k = 1. The statistical significance of this oldest value 
is over 99.9%. 

At this point we already have a large number of 
regressors, describing just the previous Dst values 
(autoregression), without satellite data and nonlinear 
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Fig. 9 Distribution of the latitudinal flow angle and the 
corresponding mean Dst values 



terms. If we add those, the number of regressors will 
only increase. 

After determining which previous Dst values are sta- 
tistically significant, we added all the solar wind pa- 
rameters available in the OMNI 2 database. Then, we 
added nonlinear terms as discussed in Section |3l Af- 
ter adding a new regressor, all the significances are re- 
calculated, and some of the old regressors can become 
insignificant. The total number of regressors is about 
150-200. Since it is very large, we will not give here 
any lists of regressors or coefficients even for the sim- 
plest case fc = 1, but th e preliminary list is given in the 
paper Parnowski ( 2008| ). 



5 Identification of new geoeffective parameters 

In this section we will demonstrate how this method 
can be used for identification of geoeffective parameters. 
We will use four parameters as an example: DOY (day 
of the year), UT (universal time) and latitudinal and 
longitudinal flow angles of the solar wind. 

On Fig. [T]one could see a clear seasonal dependence 
of the Dst index. This depen dence was described in 
many articles, for example bv lO'Brien fc McPherrqn 
( 2002h.lLvatskv et all JioOlh . iTakalo fc Mursulal (|200ll ). 
andO iver et al.l (j2000l ) , but the reason behind it is still 
disputed. Most authors believe these asymmetries are 
caused by either of two cusps turning to the sunlit side 
due to annual rot ation of the Earth with res pect to 
the Sun. However, lO'Brien fc McPherronI (|2002il state 




Fig. 10 Sum of terms describing the latitudinal flow angle 
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Fig. 11 Seasonal dependence of latitudinal flow angle's 
input in Dst 



that this me chanism would give only 1 7% of observed 
asymmetry. iTakalo fc Mursulal (|200ll ) connected the 
diurnal variations of Dst with an uneven distribution 
of Dst network stations. Let us use this known effect 
to validate our method. 

If we select two subsamples, corresponding to sum- 
mer and winter in northern hemisphere, bounded by 
vernal and autumnal equinoctia, and verify the hypoth- 
esis that the difference between the corresponding av- 
erage Dst values is statistically significant using a one- 
sided Student test, we obtain t^o = 80.264, which is well 
over 99.95% significant. Values of too corresponding 
to 99 and 99.95% significance levels are equal to 2.334 
and 3.31 respectively. For diurnal asymmetry Student 
test gives too — 8.774, which corresponds to significance 
level of more than 99.95%. Note that formally Student 
test is applicable when Dst is normally distributed. In 
fact, the distribution of Dst is slightly asymmetric, but 
taking into account the obtained values of too, we can 
be sure in qualitative conclusions made. Figs. |4]and[5] 
show the histograms of seasonal and diurnal variations 
of Dst index. 

Taking this known geoeffective factor as an exam- 
ple we demonstrate how easily one can take it into ac- 
count using regression approach. To do so one should 
simply add terms ai{j) = sin((j — 1920)7r/4383) and 
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Fig. 12 Distribution of tlie longitudinal flow angle and the 
corresponding mean Dst values. Grey columns correspond 
to quiet conditions, white columns - to all data 



0'2{j) — cos((j — 1920)7r/4383) into the regression. 
Here j is once again the number of hours since Jan 
1, 1963, 1920 is the number of hours between the be- 
ginning of the year and the vernal equinox, and 4383 
is the number of hours in half a year. The first of 
these terms is significant and describes summer/ winter 
asymmetry, and the second one (which appears statisti- 
cally insignificant) describes an absent spring/autumn 
asymmetry. Likewise, for diurnal asymmetry the cor- 
responding terms will be 61 (j) = sin((j — 2)7r/12) and 
&2(j) = cos((j — 2)7r/12). Here 2 is the time difference 
between UT and the northern geomagnetic pole, and 
12 is the number of hours in half a day. Both these 
terms are significant. 

The coefficient of the ai{j) term is 30 times less than 
the observed difference between mean Dst values of 
summer and winter subsamples. This can be explained 
in the following way: there are other regressors, which 
depend on parameters with statistically significant sum- 
mer/winter asymmetry, e.g. previous Dst values. They 
provide the lion share of summer/ winter asymmetry of 
Dst. A good example of such a regressor is the sunspot 
number R, which describes the 27-day periodicity. Nev- 
ertheless, there is a small difference which can not be 
expressed with these terms. Including it into regres- 
sion, we obtain these statistically significant regressors. 
To further illustrate this point, let us consider as an ex- 
ample a value X — const -\- A sinujt. In the regression it 
will look like Xn+i — X„ + A [sina;(i -I- At) — s'mujt] = 
Xn + A [(cos ujAt — 1 ) sin -|- cos uit sin uiAt] . The first 
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Fig. 13 Sum of terms describing the longitudinal flow an- 
gle 
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Fig. 14 Seasonal dependence of longitudinal flow angle's 
input in Dst 



term in brackets is of order (ojAt)'^, and the second - 
ujAt in the natural assumption that ujAt <^ 1. So, it 
will seem that the coefficient is AuAt rather than A. 
Note that this is just an example and has nothing to do 
with actual regressors. 

However if we look on the distribution of mean Dst 
values vs. time of the year (Fig. 2]), we see a much 
more complicated pattern of seasonal variations of ge- 
omagnetic activity. Among other features there is a 
strong asymmetry between summer and winter on one 
side and spring and autumn on the other. To take it 
into account we introduced additional terms into our 
regression, which are powers of ai{j) and their prod- 
ucts with powers of 02 (j). The sum of regressors with 
the corresponding coefficients, depicted on Fig. \6\ is 
very similar to Fig. H) Note that Fig. [6] was obtained 
independently from Fig. 21 

We did the same thing with the diurnal asymmetry. 
The distribution is plotted on Fig. [5l and the sum of 
regressors with the corresponding coefficients - on Fig. 
[T] The term ai{j) ■ bi{j) is also significant and should 
be included in the regression. After this we obtained a 
joint distribution of semiannual and diurnal variations 
of Dst index, plotted on Fig. El It contains 18 regres- 
sors. Increasing the number of regressors describing 
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Fig. 15 Comparison betw een predictions results of 
lO'Brien fc McPherronI (|2000cJ ) (top) and ours (bottom) 1 
iiour ahead. The following designations are used: 'Kyoto' 
- official Dst index, available at Kyoto WDC for Ceomag- 
ne tism: 'AKl' - prediction based on the mo del of Burton et 
al. (If975l ) with re-calcul ated coefBcients; 'UCB' - pr ediction 



based on the model of lFenrich fc LuhmannI (If 9981): 'AK2' 
- predi ction based on the model of O'Brien fc McPherronI 
l|2000bl ): 'ACE Gaps' refer to the top line, indicating the 
availability of solar wind data measured by ACE satellite 



temporal variations of geomagnetic activity we can im- 
prove the accmacy of this distribution. In particular, 
one could add 11-year and 22-year solar cycles, higher 
powers of ai{j) and bi{j) etc. 

Thus, we demonstrated how easily one can take into 
account new geoeffective parameters in this method's 
framework. 

Now let us discuss parameters, whose geoeffective- 
ness was determined by this method, and demonstrate 
that they are indeed geoeffective. 

Latitudinal flow angle 9v was mostly associated with 
the southern component of IMF. I plotted the distri- 
bution of its value and the corresponding mean Dst 
value on Fig. |9l The distribution looks similar to 
a normal distribution, but it significantly differs from 
the normal one according to test. This manifests 
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Fig. 16 Error charts of predictions results of O'Brien & 
McPherron l|2000ah ftopl and ours (bottoml 1 hour ahead. 
Error chart for our 9 hour prediction is plotted for reference 



in much larger number of points with deviations more 
than a than follows from the normal distribution. This 
is mostly caused by the number of points in the wing 
bins \9v — {Ov)\ > 4cr being equal to 196 points ver- 
sus 11 points in the case of normal distribution. How- 
ever, most of these points were obtained in the 1960s, 
when quality of measurements was much worse then to- 
day. This period includes the maximum and minimum 
values of 6v, equal to —59.7° and 18.8°. Neverthe- 
less, these points constitute only a minor fraction of all 
points and didn't affect the linear regression routine. 
Assuming normal distribution we obtain a = 2.925 and 
{9v) — 0.27 < O.lfj. Thus, the distribution is insignifi- 
cantly shifted towards positive values. 

If we ignore the wing bins in the distribution of mean 
Dst values against 0v , which are somewhat random due 
to small amount of points in them, we will notice a 
slight almost linear trend. If we plot the sum of terms 
containing 9v (Fig. \TO\i . we will notice a similar trend. 
If we select two subsamples, one —8<9v< —5 and 
other 4 < < 9, and verify the hypothesis that the 
difference between the corresponding average Dst val- 
ues is statistically significant using a one-sided Student 
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test, we obtain too = 6.278, which is over 99.95% sig- 
nificant. 

If we divide the sample in two subsamples: one for 
northern summer and one for northern winter, we ob- 
tain such a picture (Fig. [TT]). Once again, let us not 
look at the wing bins. What do we see? Summer dis- 
tribution has an obvious linear trend, but the winter 
one has not. If we apply the Student test to the same 
intervals now, we obtain that ioo is 5.44 in the summer 
and only 0.059 in the winter. The prior corresponds to 
more than 99.95% significance, while the latter - to less 
than 10%. This could mean that there are two factors 
connected with the latitudinal flow angle, which work 
together in the summer and against each other in the 
winter. The physical explanation of this phenomenon, 
however, lies beyond the scope of this paper. 
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Fig. 18 Compar ison between predictions results of Pal- 
locchia et al. (200fi) (top) and ours (bottom) 1 hour ahead. 
Our 3-hour prediction is given for reference. On the top plot 
black line depicts Dst from Kyoto WDC, and the blue line 
- 1 hour prediction 



Longitudinal flow angle ipv was only occasionally 
used in models. However, it appeared to be even more 
significant than the latitudinal fiow angle. Its distri- 
bution together with corresponding mean Dst values is 
plotted on Fig. [T^ where white bars show the com- 
plete dataset sans rejects, and the grayed bars show 
the quiet-time sample with Dst > —5QnT. Like the 
latitudinal flow angle, the distribution of the longitudi- 
nal flow angle resembles normal distribution. However, 
test disproves the relevant null-hypothesis. Once 
again, this is mostly due to wing bins which are mostly 
formed of data points, corresponding to measurements 
in 1960s, including maximum and minimum values 
equal to —65.6° and 48.5°. Assuming normal distribu- 
tion we obtain a = 2.934 and (fv) = —0.30 « — O.lcr. 

A significant trend is the most prominent feature of 
this figure. If we plot a sum of regressors, which con- 
tain tfv (Fig. [131), we see a very similar trend. Like 
before, we plot the distribution for summer and winter 
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subsamples separately (Fig. [14)) . We see that the trend 
is identical on both plots, so the corresponding effect 
is season-independent. The list of regressors for fc = 1, 
containing Oy and ipv , is given in Table [TJ It contains 
the regressors themself, their coefficients and F values. 

Thus, we demonstrated that our method is truly ca- 
pable of pointing out new geoeffective parameters and 
verified the geoffectiveness of two such quantities. 



6 Prediction results 

Taking into account the considered parameters together 
with parameters, whose geoeffectiveness was beyond 
doubts, like previous values of Dst, dawn-dusk electric 
field, ram pressure of the solar wind and most of other 
parameters from the 0MNI2 database, we constructed 
models for predicting Dst 1, 3, 6, 9, 12, 18, and 24 
hours ahead, and 3 more models for predicting Dst 1 
hour ahead for quiet and perturbed conditions and for 
the case when sat ellite data are unava ilable (autoregres- 
sion, see more m IParnowskil (|2009b[ )'). The statistical 
characteristics of these models are summarized in Table 
H They include Residual Mean Square (RMS), Linear 
Correlation Coefficient (LC), and Prediction Efficiency 
{PE = 1-RMS^/SD'^, where SD is the sample's Stan- 
dard Deviation). In divided cells the top number cor- 
responds to the actual model, and the bottom one - 
to the simplest possible model Dst{j + k) = Dst{j)- It 
is noteworthy that despites good correlation for all the 
models, in reality only the 1-hour and 3-hour models 
are ready for practical use, and the 6-hour model can 
potentially reach this state. This is due to a significant 
time shift being present in further predicting models. 

Note that since the proposed method is statisti- 
cal, there is little difference whether the "training" 
sample contains the period when prediction is made 



Table 1 Regressors containing the flow angles. V(j) is the 
bulk flow velocity of the solar wind 
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("test" sample) or not. To further illustrate this point 
let us consider an example: I predicted Dst 3 hours 
ahead using the "test" sample from 2007 to 2008 and 
3 "training" samples: the sample from 1976 to 2003 
gives LC=0.851, the sample from 1976 to 2006 gives 
LC=0.854, and the sample from 1976 to 2008 gives 
LC=0.854 as well. The first two "training" samples 
do not contain the "test" sample, but the results are 
slightly different. The third "training" sample contains 
the "test" sample, yet the result is the same as for the 
second sample. This yields a conclusion that the vol- 
ume and statistical properties of the "training" sample 
affect the prediction results stronger than the inclusion 
of the "test" sample. So, the inclusion of the "test" 
sample to the "training" sample has little or no effect 
on the LC. PE is calculated independently from the 
"test" sample and is not affected by its selection in any 
way. 

We also present graphical representations of pre- 
diction in comparison with re sults of other authors: 
O'Brien fc McPherronI (l2000ah (Fis. Hm ITBl). Cid et 



al. |2005l ) fFig. [T71). and IPallocchia et al.l (|2006l ) (Fig. 
[T5|) . Note, that the intervals for prediction were selected 
by authors of the original papers, and the coefficients 
in eq. ([T]) were the same all the time and for all the 
figures. It is clearly visible that our method provides 
much more precise forecast than most empirical mod- 
els and typical neural network models. Ridiculously, 
even our 9^* model is more precise than most empir- 
ical l'' models. The autoregression model, described 
by eq. (HI), though, lags in the left part of the plot 
due to a rapid positive change of Dst at 1500 UT. The 
lag persists through the growth phase and the main 
phase, and vanishes only in the recovery phase. For 
this reason, the autocorrelation model holds little prac- 
tical value and should be considered as a transitional 
result, required to construct the full model. It is, how- 
ever, possible to improve it by adding terms describing 
temporal variations, and, for example, the number of 
sunspots, but then the term "autoregression" will no 
longer be applicable. 

On Fig. [TH] we present the results of prediction 3, 
6 and 9 hours ahead for a number of events, kindly 
selected for us by V.G. Fainshtein, which are particu- 
la rly hard to predict by medium-term methods, such 



as 



Eselevich et al.l (|2009l ). to verify the efficiency of our 
method. We can see that this method's accuracy is 
higher for stronger storms, which are of greater inter- 
est. A huge advantage of this method is that the most 
resource-demanding operation - the calculation of the 
regression coefficients - should be performed only once 
for each model. The prediction itself is just a summa- 
tion of a polynom, which usually takes no more than 4-6 
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Table 2 Statistical characteristics of forecasting models 
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seconds on an average PC (including disk I/O), which 
allows for creation of fully automated operational on- 
line space weather forecast services. 



7 Conclusion 

The proposed regression approach appeared to be more 
than adequate for space weather forecasting. For the 
forecasting per se, its main advantages are quite good 
correlation (about 90% for 6 hours forecast), adapt- 
ability to any samples, and very fast forecasting code 
(typically about 5 seconds on an average PC). For the 
identification of geoeffective parameters it is extremely 
convenient and easy to use. In particular, it allowed to 
uncover 2 new geoeffective parameters - the latitudinal 
and the longitudinal fiow angles of the solar wind. 

This is just a short summary of the regression mod- 
eling method, since its full description would take much 
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more space. Of course, this method can be used in con- 
junction with other methods, first of all, with physical 
methods of detection of large-scale perturbations in the 
solar wind and with empirical models. 
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